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SUMMARY 


Robust  estimates  for  the  parameters  in  the  general  linear  model  are 
proposed  which  are  based  on  weighted  rank  statistics.  The  method  is  based 
on  the  minimization  of  a  dispersion  function  defined  by  a  weighted  Gini's 
mean  difference.  An  asymptotic  distribution  of  the  estimate  is  derived. 
Some  examples  are  discussed  which  point  out  that  the  ranking  can  be  based 
on  a  restricted  set  of  comparisons  and  still  retain  high  efficiency. 


Key  words:  Robust  estimates,  linear  models,  weighted  rank  statistics, 
dispersion  function,  Gini  mean  difference 


1.  INTRODUCTION 


Consider  the  linear  model 
(1.1)  Y  -  3  1  +  Xg  +  e, 

_  o  _  _ 

where  Y  ■  (Y^  ,  ,  Y  )  is  an  n  x  1  random  vector,  1  is  an  n  x  1 

vector  with  each  element  equal  to  one,  X  *  (x^)  is  an  n  x  p  design 

matrix,  6  -  ,  ...  ,  Bp)'  is  a  p  x  1  vector  of  parameters  and 

e  *  (e^ . eQ) '  is  an  n  x  1  vector  of  random  errors .  Assume  that 

el *  e2  *  ”  *  *  en  are  independent  with  a  common  distribution  having  density 
function  f  .  The  residuals  are  given  by  Z  -  (Z^  ,  . . .  ,  Z^) '  where 

Z  -  Z(g)  -  Y  -  Xg  . 

Methods  of  estimation  of  g  are  typically  based  on  some  principle 
of  maUng  the  residuals  small.  The  classical  least-squares  approach  is 
to  minimize  the  sum  of  squares  of  the  residuals.  The  resulting  estimate 
is  optimal  under  normality  assumptions.  However,  the  least-squares 
estimate  is  not  robust  in  the  face  of  departures  from  the  model.  It  can 
be  inefficient  when  the  error  terms  follow  a  non-normal  distribution  and 
it  can  be  very  sensitive  to  outliers  and  high  leverage  points  in  the 
design  matrix.  These  problems  with  least-squares  estimates  have  spurred 
the  development  of  other  types  of  estimates  which  are  more  robust. 

There  has  been  considerable  work  in  recent  years  on  the  M-estimate 
approach  and  on  the  method  of  minimizing  the  sum  of  the  absolute  values 
of  the  residuals  (Least  Absolute  Deviation  estimates) .  Methods  based  on 
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rank  statistics  have  also  been  proposed.  With  regard  to  the  rank  statistic 
approach,  basic  material  can  be  found  in  the  papers  of  JureSkova  (1969), 

(1971);  Koul  (1970),  (1971);  Adichie  (1978)  and  Sen-Puri  (1977). 

Jaeckel  (1972)  has  discussed  the  value  in  using  a  dispersion  function 
and  in  defining  the  estimates  to  be  the  values  of  the  parameters  that 
minimize  the  dispersion  of  the  residuals.  He  showed  how  estimates  based 
on  linear  rank  statistics  can  arise  with  a  suitable  choice  of  dispersion 
function.  This  approach  has  been  further  extended  by  Hettmansperger  and 
McKean  (1976),  (1977),  (1978b). 

This  paper  will  examine  the  estimate  of  6  that  arises  with  a  dis¬ 
persion  function  defined  as  a  weighted  Gini's  mean  difference.  Ginl's 
mean  difference  is  a  familiar  measure  of  dispersion  and  it  has  been  pro¬ 
posed  for  the  linear  model  problem  by  Wainer  and  Thissen  (1976) .  The  use  * 

of  weights  adds  greater  flexibility.  The  asymptotic  theory  of  the  partial 
derivatives  of  the  dispersion  function  will  be  examined  and  an  asymptotic 
linearity  result  is  given.  This  dispersion  function  is  shown  to  be 
asymptotically,  locally  quadratic.  These  results  are  used  to  establish 
the  asymptotic  distribution  of  the  proposed  estimate  of  8  .  The  paper 
concludes  with  some  comments  on  the  weights  and  some  applications .  Proofs 
of  the  theorems  can  be  found  in  the  Appendix. 
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2.  THE  DISPERSION  FUNCTION 

Consider  the  dispersion  function 

(2.1)  d  -  D(e)  -  l  blj|zi  -  Z  |. 

i<j 

where  the  >.0,  1  <  i  <  3  are  a  given  set  of  weights.  Each  pair 

of  residuals  is  compared  by  the  absolute  difference  and  the  weight  that  is 
attached  can  reflect  the  importance  of  the  comparison.  Note  that  the 
weights  can  depend  on  the  design  matrix  X  .  It  is  possible  to  have  some 

0  ** 

of  the  weights  equal  to  zero  and  this  will  drop  some  pairwise  comparisons 
from  consideration.  The  special  case  of  equal  weights,  b^  =  1  ,  gives 
rise  to  Glnl's  mean  difference.  Hettmansperger  and  McKean  (1978a)  have 
shown  that  this  dispersion  function  is  equivalent  to  Jaeckel's  dispersion 
function  with  Wllcoxon  scores. 

The  dispersion  function  D  can  be  expressed  in  another  form.  Let 

(R^  ,  ...  »  Eq)  denote  the  ranks  of  the  residuals;  that  is,  is  the 

rank  of  in  the  set  {Z^  ,  ...  ,  ZQ)  ,  1  <  i  <  n  .  Let  sgn(v)  - 

+1  ,  0  ,  -1  as  v  is  >0  ,  ■  0  ,  <0  .  Extend  the  definition  of  the 

weights  b^  to  all  subscripts  i,  J  •  1 . n  by  using  b^  -  b^ 

and  b1±  ■  0  .  Then,  using  |v|  -  v  sgn(v),  some  manipulation  shows 

n 

(2.2)  D  -  l  Z±  , 

i-1 

with  l  b^  sgn(Z1  ~  Zj)  »  i-1  ,  n. 

J+i 


The  coefficients  are  random  with  B^  depending  on  the  rank  of 

and  also  on  the  subscripts  of  the  residuals  that  are  less  than  .  In 
the  special  case  =  1  ,  B^  =  2R^  -  (n  +  1)  . 

Another  dispersion  function  which  is  similar  to  D  is 

■>*-  l  •>„!*«)- *(1)l  • 

i<j 

in  which  the  weights  correspond  to  the  ordered  residuals  Z^  1.  ' ‘ *  1.  ^(n) 
It  can  be  shown  that 

n  n 

D*  -l  1  B!  z(d  • 

i-1  1  i-1 

i-1  n 

where  B*  -  £  b±j  -  J  b±j  . 

j-1  J-i+1 

In  this  form,  it  can  be  seen  that  D*  is  equivalent  to  Jaeckel's  disper¬ 
sion  function  with  the  B*  serving  as  the  score  function.  If  b^  =  1  , 
then  D  =  D*.  They  are  not  equal  in  general.  This  shows  that  the  weights 
used  in  D  serve  a  different  purpose  than  the  score  function  used  by 


Jaeckel. 
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3.  PARTIAL  DERIVATIVES  OF  D 

To  estimate  the  parameter  8  ,  consider  using  a  point  in  the  para¬ 
meter  space  which  minimizes  the  dispersion  function  D(8)  of  (2.1).  This 
function  is  nonnegative,  piecewise  linear  and  convex.  Various  numerical 
methods,  including  linear  programming  algorithms,  can  be  used  to  determine 
an  estimate.  The  solution  is  not  unique  in  general.  However,  under  some 
conditions,  it  follows  from  the  work  in  section  5  that  the  diameter  of 
the  set  of  solutions  tends  to  zero  asymptotically. 

The  partial  derivatives  of  D  should  be  (approximately)  equal  to 

zero  at  the  minimum.  Using  form  (2.2),  these  derivatives  are 

n 

(3.1)  3D/3Pk  -  -  l  B±  xik 

i-1 

for  k-l,...,p,  at  points  8  where  they  exist.  Another  form  of  the 

derivatives,  that  can  be  seen  by  writing  D  ■  £  b^  sgn(Zj  -  Z^  (Z^  -  Zj) 

i<j 

is 


3D/ 3  f$k  -  - 


(3.2) 


l  b±i  sgn(Zj  -  Z1)(xjk  -  x±k) 

i<j 


’2  l  bij(xjk 
i<j 


Xik^(Zi  »  V  +  ^  blj(xjk  "  Xik)  • 
i<j 


where  f(u  ,v)  -  (sgn(v  -  u)  +  l)/2  ■  0  ,  1/2,  1  as  u>v,  u-v,  u<v. 

In  this  form,  the  derivatives  can  be  seen  to  depend  on  the  rank  order  of 
the  residuals.  They  Involve  a  general  type  of  random  variable  of  the  form 
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I  aij  ^Zi  *  Zj^  ’ 
i<j 


which  will  be  considered  in  more  detail  in  the 


next  section. 

The  form  of  the  derivatives  in  (3.2)  can  be  changed  to 


90/S6t  -  l  bylx^  -  xj  sgnCx^  -  x^.gn^  -  Z^, 
i<j 

for  k  -  1  .  p  .  This  is  a  "weighted"  Kendall's  tau  random  variable 

for  Z  vs  x^  .  Thus  when  the  partial  derivatives  are  zero,  the  residuals 
are  uncorrelated  with  the  independent  variables  in  this  nonparametric 
sense.  This  is  directly  analogous  to  the  least-squares  approach  where 
the  least-squares  estimate  of  B  can  be  defined  by  specifying  that  the 
residuals  be  uncorrelated  (Pearson  product  moment  correlation)  with  the 


independent  variables. 
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4.  A  GENERAL  CLASS  OF  RANDOM  VARIABLES 

In  this  section  a  general  class  of  random  variables,  related  to  the 
derivatives  of  the  dispersion  function,  is  defined.  An  asymptotic 
normality  result  is  given  with  the  proof  delayed  until  the  Appendix. 

For  each  k  -  1  ,  . . .  ,  p  ,  let  a  set  of  constants 
(a^Ck):  1  <  i  <  j  <  n}  be  given.  Let 

n 

a1. (k)  -  \  a^Ck)  for  i  “  1  »  •••  .  n  “  1 

j-i+1 

j-1 

a^(k)  ■  £  a^ (k)  for  j  *2,...,n 

i-1 

aQ.(k)  -  0,  a#1(k)  -  0,  a_(k)  -  \  a^k) 

A±(k)  -  a#±(k)  -  a±>(k)  . 

For  asymptotic  purposes,  a  sequence  of  these  constants  is  needed.  Indexed 

on  n-1,2,...,  but  this  dependence  on  n  will  be  suppressed  in 

the  notation.  In  a  similar  fashion,  the  dependence  on  n  of  other  quantities 


will  not  be  Indicated  in  the  notation 
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AS SUMPTION  (Ax)  :  For  each  k  -  1 . . 

a  , 

l  A‘(k) 

1=1 

- - -  -*•  °°  as  n  00  . 

max  (k) 

l£i<n 


ASSUMPTION  (A2);  For  each  k  -  1 . . 


I 

i<j 


2 

aij 


(k) 


0  n  ->■  00  . 


Define  the  random  variables 


Tk  “  V?5 


£  a.  a  (k) <J> (Z ,  ,  Z.)  , 
i<J  J  J 


for  k  =  1  ,  . . .  ,  p  ,  where  <J>  (u  ,  v)  =  0  ,  1/2  ,1  as  u  >  v  ,  u  =  v, 

u  <  v  .  Let  T  =  T(8)  ■  (T^ . T^) '  be  the  p  x  1  vector  of  these 

random  variables.  Note  that  this  type  of  random  variable  arises  in  the 
derivatives  of  the  dispersion  function  in  (3.2)  with  the  correspondence 
aij (k)  =  bij^xjk  "  xik^  *  Specifically, 

(4.1)  3D/36k  =  -2Tk(B)  +  a_(k)  . 


In  order  to  consider  the  asymptotic  distribution  of  T(8)  ,  the 


following  notation  is  introduced. 
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Specify  a  sequence  of  parameter  values,  contiguous  to  0  by  assuming 
3  -  A/  Jn  ,  where  A’  -  (A.  ,  A  )  Is  a  fixed  vector. 

*  *  a  ^  P 

Let  the  centered  design  matrix  be  X  “  (I  -  (l/n)J  )X  ,  where  I 
is  an  n  x  n  identity  matrix  and  Jq  is  an  n  x  n  matrix  of  "ones". 

Let  x^  be  the  average  of  the  kth  column  of  X  ,  k  -  1  ....  ,  p  . 

Let  A  be  an  n  x  p  matrix  with  (i  ,  k)t^1  element  equal  to 

A±(k)  ,  i  -  1  ,  ,  n  ,  k  -  1 . .  ,  and  let  vn  *  ^  Aq  .  Let 

aj.  -  (a^d) . a  ..(p))  and  let  »  (yQ(l)  ,  . . .  ,  Un(p))  ■ 

(Jf2)  A;  Xc  3  +  (l/2)a..  . 

ASSUMPTION  (A3):  For  k  -  1  .....  p 

lxik  ' 

-  -*•  0  as  n  -*■  ® 

l<l<n 


ASSUMPTION  (Af|)i 

(l/n)X^  Xc  -2  as  -*•  »  , 

/ 

where  Z  Is  a  p  x  p  ,  positive  definite  matrix. 

ASSUMPTION  (A,-) :  There  exists  a  sequence  of  constants  {yq}  such  that 

y  V  -*•  V  as  n  -*  «  , 

n  _n 

where  V  is  a  p  x  p  ,  positive  definite  matrix. 
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THEOREM  4.1  Assume  the  error  density  f  is  absolutely  continuous  and 
/(f'/f)2f  dx  <  ®  .  Let  assumptions  (A^  ,  (A^  ,  (A3)  ,  (A^)  and  (Aj) 

hold.  Then,  if  g  »  A//n~, 

Y*/2(T(0)  -  un)  N(0  ,  (1/12)V) 

as  n  ->■  00 


£ 

The  notation  "  — *■  "  reads  "converges  in  distribution 
translation  property  of  the  result  can  be  noted  since 

—  !<?)|82-61  —  I<?1  -  ?2}l  0  •  T(?l>l82 

the  distribution  of  T^)  when  g  *  g2  . 


A 

refers  to 
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5.  ASYMPTOTIC  LINEARITY 

In  this  section,  a  local,  asymptotic  linearity  result  is  given  for 
T(B)  .  The  proof  can  be  found  in  the  Appendix. 

Let  A  -  (Ax  ,  . . .  ,  Ap) '  denote  a  p  x  1  vector  and  let  c  >  0 

be  given.  Define  a  set  6  (A:  ”ci  >  lc*l . p). 

Lgt  a  p  x  p  matrix  be  defined  with  (k  9  i)  element 

%t  -  i±<]  -  xit>  • tet 

(5.1)  R(A)  -  n'3/2[T(A/v£)  -  T(0)  -  Cn(A/*£)]  . 

Let  G(y)  -  -  e2  <_  y)  denote  the  cdf  of  the  difference  of 

independent  random  variables,  each  with  density  f  . 

Let  ||  *||  denote  Euclidean  distance. 

ASSUMPTION  (A^) :  The  cdf  G  has  a  density  g  ■  G*  and  g(y)  is 

continuous  at  y  *  0  . 

ASSUMPTION  (Ay) :  For  each  It  -  1  ,  ....  P 

I  *5jCk) 

_ _  is  bounded  as  n  . 

<$> 

t.rmma  S.l  Let  assumptions  (Aj)  ,  (A^)  ,  (A^)  and  (Ay)  hold.  If 
g  -  0  ,  then  R(A)  0  ,  uniformly  in  A  e  4  .  (That  is,  for  all 

e  >  0  and  6  >  0  ,  there  exists  N  such  that  P(||  R(A)  ||  >.  e)  <.  6 

for  all  n  >  N  and  all  A  e  4  .) 
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The  lemma  shows  that  T(8)  can  be  approximated  by  the  linear  function 

,  -  - 

T(0)  +  C  B  asymptotically  for  8  sufficiently  near  zero,  however  the 
_  _  _n_ 

result  is  not  strong  enough  for  the  application  needed.  A  stronger  result 
is  given  in  the  following  theorem. 

THEOREM  5.1  Let  assumptions  (A^)  ,  (A^)  ,  (A^)  and  (Ay)  hold.  If 

B  =  0  ,  then  supAej£>  ||r(A)||  — 0  as  n  -►  «  . 


\ 


I 
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6.  DISTRIBUTION  OF  B 

The  estimate  of  £  has  been  defined  to  be  a  point  In  the  parameter 

A 

space,  say  ,  which  minimizes  the  dispersion  function  D(£)  of 

(2.1).  The  set  of  solutions  to  this  minimization  problem  Is  bounded  as 
seen  by  the  following  lemma.  The  proof  of  this  lemma  is  exactly  the  same 
as  that  of  Theorem  2  of  Jaeckel  (1972). 

LEMMA  6.1  If  the  centered  design  matrix  X£  is  of  full  rank  p  , 

then  (6:  D(S)  <  D  }  is  bounded  for  any  number  . 

_  _  —  o  .o 

A 

In  order  to  deal  with  the  asymptotic  distribution  of  SQ  ,  it  is 

A  " 

A 

convenient  to  work  with  A  ■  i^n  8  and  define 

.  _n  _n 

D* (A)  -  (l/n)D(A/»£)  . 

A 

Then  A  minimizes  D*(A)  . 

-n  « 

To  match  the  dispersion  function  D*  to  the  T  vector  of  section  4, 

mm 

use  the  correspondence 

(6.D  8ij(k)  “  bij(xjk  “  xikJ  * 

Then  from  formula  (4.1),  the  vector  of  partial  derivatives  is 

3D*(4)  _3/2 

-  -  n  [ 

3A 


-2T(A/»£)  +  a>#]  . 
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With  (6.1)  ve  can  give  more  definite  expressions  for  some  matrices 

that  were  defined  earlier.  The  elements  of  the  matrix  A  of  section  4 

_n 

are  linear  functions  of  the  elements  x^  of  the  design  matrix  and  with 
some  manipulation,  write 


A  =  B  X  , 

_n  _n  _ 

where  is  an  n  x  n  symmetric  matrix  involving  the  weights  of  the 

dispersion  function.  Specifically,  define  the  (i  ,  j)C^  element  of 
Bq  to  be  “b^j  if  i  <  j  and  -b^  if  i  >  j  .  The  i*"*1  diagonal 
element  of  B^  is  b^  .  Thus  B^  has  the  negatives  of 

the  dispersion  function  weights  for  its  off-diagonal  elements  and  positive 
diagonal  elements  determined  so  that  the  row  sums  are  zero.  Also  write 


V  =  A'  A  =  X'  B  B  X. 

„n  _n  ,n  „  _n  _n  _ 

Again  with  (6.1),  the  matrix  C  of  section  5  will  have  (k  ,  fc)1"*1 

_n 

element  c kt— (/  f2)  l  b±J(xJk  -  *lk)(xJ1  -  *u)  .  With  some 
further  manipulation,  it  follows  that 


Cn  -  -(/ fZ)  X*  Bq  X  . 


The  matrices  and  have  been  expressed  directly  in  terms  of 

the  design  matrix  X  and  the  weight  matrix  B^  .  Since  the  row  (and 
column)  sums  of  B^  are  zero,  the  centered  design  matrix  could  be  used  with 
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y  -  X'  B  B  X 
_n  _  c  _n  _n  _c 


Cn  -  -(/f2)X'  »„  Xc 


ASSUMPTION  (Aq):  With  (6.1)  holding, 

2 

(1/n  )C  C  as  n  -*■  •  , 

•&  • 

where  C  is  a  p  x  p  matrix  of  full  rank. 

Now  define  a  quadratic  function  of  A  to  use  as  'an  approximation  to 
D*(A)  by 

Q(A)  -  -A’  C  A  +  n_3/2[a#>  -  2T(0)]'a  +  D(0)  . 


Then 


3Q(A)  -3/2 

—  ■  -2C  A  +  n  3/Z[a_  -  2T(0)J 


3A 


LEMMA  6.2  Let  assumption  (Ag)  hold  along  with  the  assumptions  of 
Theorem  5.1.  Then 


SupAe*  n 


r3/2||  T(A/^n)  -  T(0)  -  n3/2 


CA 


0  as  n  -*■  00  . 


LEMMA  6.3  Let  the  assumptions  of  Lemma  6.2  hold.  Then 


8UPa«-  «  II 


3Q(A)  3D* (A) 


Ae  6  11  3A 


3A 


II 


as  n  •+■  ® 


T.KMMA  6.4  Let  the  assumptions  of  Lemma  6.2  hold.  Then 

P 

supAeJj  |Q(A)  -  D*(A)  | - *■  0  as  n  -*■  « 


mmi n 
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Having  established  that  Q(A)  can  be  used  to  approximate  D*(A)  , 
it  will  next  be  shown  that  A  values  minimizing  these  functions  must 
be  close  enough  to  have  the  same  asymptotic  distribution  . 

On  setting  the  partial  derivatives  equal  to  zero,  Q(A)  is 

* 

minimized  at  the  point 

A*  =  n~3/2  C-1[(l/2)a_  -  T(0)]  . 

THEOREM  6,1  Let  assumption  (Ag)  hold  along  with  the  assumptions  of 

-3  * 

Theorem  5.1.  Let  assumption  (Ac)  hold  with  y  ■  n  .  Then  A  and 

j  n  _n 

A*  are  asymptotically  equivalent.  (See  definition  on  page  1453  of  Jaeckel 
(1972).) 

Now  if  Theorems  4.1  and  6.1  hold,  when  6  =  0 

£ 

n_3/2[T(0)  -  (l/2)a.J  - ►  N(0  ,  (1/12)V)  as  n  • 

It  follows  that  A*  is  asymptotically  N(0  ,  (1/12)C  3  V  C-3)  and 

the  following  result  is  immediate. 

COROLLARY  6.1  Under  the  assumptions  of  Theorems  4.1  and  6.1,  when 
6  =  0 

y'n  0  =  A  is  asymptotically  N(0  ,  (1/12)C  3  V  C  3)  . 

~n  _n  _  ... 

Note  that  from  the  translation  invariance  of  the  estimate, 

£ 
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If  n  C~^  V  C  1  is  used  as  an  approximation  to  C  ^  T  C  ^  ,  the 

_n  .n  -n  -  -  - 

A 

asymptotic  covariance  matrix  of  is  approximately 


(6.2) 


cov(i  )  -  (1/12)  C'1  V  C"1 

_n  _n  _n  _n 


(l/12(/f2)2)(X’  B  X)"1(X’  B  B  X)(X'  B  X)'1  . 


Note  that  X£  can  be  used  in  place  of  X  in  this  formula.  It  should 

A 

be  emphasized  that  cov(Bq)  is  not  the  exact  covariance  matrix.  In 
spite  of  this,  formula  (6.2)  may  prove  useful  in  examing  the  effect 
of  different  weight  matrices  on  the  estimate. 

For  the  special  case  b^j  =  1  ,  the  unweighted  case, 

Bn  “  n(In  "  <l/®>£n>  and 

(6.3)  cov(i)  -  u/i2(Jf2)2)(x;  x  r1  . 

«D  *>  t 

A 

Note  that  cov(f5n)  depends  on  Bq  through  the  matrix  H  ■  Bq  X 
since  [(X’  B  X)(X'  B„  B  X)“1(X*  Bn  X)]"1  -  [X'  H(H'  H)_1  H»  X]-1  .  The 
matrix  H(H'  H)-1  H*  is  a  projection  matrix,  the  projection  being  into 
the  column  space  of  H  . 
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7.  THE  WEIGHT  MATRIX  B 
-  _n 


In  the  unweighted  case,  b..  =  1  and  B  *  n(I  =  (l/n)J  ).  This 

ij  ~n  _n  _n 

is  a  benchmark  case  because  of  its  simplicity.  It  also  yields  the  highest 
asymptotic  efficiency  as  can  be  seen  by  expressing  the  difference  of  the 
covariance  matrices  (6.2)  and  (6.3)  as  (excepting  the  constant  multiple) 

(X'  B  X  )_1(X'  B  B  X  )(X'  B  X  )_1  -  (X'  X  )_1  =  M  M' 

_c  _n  _c  _c  _n  _n  _c  _c  .n  _c  _c  „c  _  _ 

where  M  -  (X'  B  X  )_1  X'  B  -  (X'  X  )-1  X’  .  The  matrix  M  M'  is 

„  _c  _n  _c  _c  „n  .. c  _c  _c  _  „ 

positive  semideflnite  and  as  a  result  the  trace  and  determinant  of  (6.2) 

cannot  be  less  than  that  of  (6.3). 

Formulas  (6.2)  and  (6.3)  are  equal  if  and  only  if  M  *  0  or 

•„  y-1  r  Bn  -  (x;  X^-1  r  .  Equivalently,  B,  X.  - 

X  (X'  X,,)-1  X'  B„  X  .  Since  X  (X'  X  )-i  X’  is  the  projection  map 

.c \»c  _c  _c  _n  _c  _c\c  _c  _c 

into  the  column  space  of  X£  ,  the  equality  will  hold  if  and  only  if, 

the  columns  of  B  X  are  in  the  column  space  of  X  (B  X  *  X  G  , 

%n  „c  „ 

for  some  p  x  p  matrix  G  )  . 

The  preceeding  work  indicates  that  the  use  of  weights  b^.  $  1 
cannot  increase  the  efficiency  of  the  estimate  over  the  unweighted  case. 

It  may  be  possible  to  choose  weights  so  as  to  lose  little  or  no  efficiency 
and  yet  gain  in  some  other  respect.  This  matter  needs  further  study. 


By  using  weights  b^  “0  or  1  we  can  reduce  the  number  of  terms 
in  the  dispersion  function  D  by  reducing  the  number  of  comparisons. 
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Thls  may  have  computational  advantages.  Some  examples  in  a  later  section 
show  that  there  need  not  be  a  loss  In  efficiency.  With  such  a  choice 
of  weights  the  dispersion  function  depends  on  "restricted"  ranks  of  the 
residuals  .  Specifically,  using  form  (2.2), 

D  -  IJ-i  (2»±  -  (#C±  +  l))Zit 

where  (^  -  {jsb^-l,  1<  ]  <.n  ,  j  ^  i  }  ,  the  number  of  elements 

in  is  denoted  by  and  Is  the  rank  of  Z^  In  the  set 

{  Zj :  j  e  CA  }  . 

The  weights  b^  are  associated  directly  with  the  observations  (not 
on  the  ordered  observations)  and  can  be  chosen  to  depend  on  the  design 
matrix  X  .  It  may  be  possible  to  reduce  the  effect  of  high  leverage 
points  in  the  design  matrix  with  suitable  choice  of  weights. 

When  X^  Xc  is  nearly  singular,  the  use  of  weights  can  reduce  the 
effects  of  multicolinearity. 

One  possible  approach  to  setting  the  weights  is  to  assign  a  weight 
w^  for  each  observation  i  ■  1  ,  ...  ,  n  and  the  use  b^^  -  wiwj  *  *  ^  J  • 

Assume  that  w^  ■  1  for  simplicity.  Define  an  n  x  n  diagonal 

matrix  W  to  have  ith  diagonal  element  w.  and  a  vector  w  ■ 

m  •  * 

(w.^  ,  . . .  ,  wQ) '  .  Then  the  weight  matrix  is 

B  -  W  -  v  w* 

_n  _  - 

-  (I  -  W  J  )W(I  -  J  W) 

n  _  „n  _  _n  _n  . 

-  (I  -  W  J  ) (W  -  w  w')(I  -  J  W)  . 

n  .  _n  .  .  .  _n  _n  . 


Note  that  X  has  (i  ,  j)th  element  w^x^ ,  -  x*) ,  where 


-5 


U  ”1  xii 


fcie 

is  a  weighted  average  of  the  j  column  of  X  . 


Bn  X  is  then  a  "weighted",  "centered"  design  matrix.  Overall,  this 
approach  seems  to  be  worth  further  consideration  because  it  is  a  simpler 


task  to  assign  n  individual  weights  than  to  deal  with  (2)  pairwise 


weights . 
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8.  EXAMPLES 

Example  1  Suppose  there  are  n^  observations  (Group  1)  following  the 
model 

v  »„+  si**>+-  +  6P*iP  +  e;-  ‘-1 . "i 

and  another  n2  observations  (Group  2)  following  the  model 

Ti  -  +  6i  *11 +  ••• +  6P  'ip +  ej  •  1  -  ni + 1 . -i +  n2  -  °  • 

Mote  the  different  Intercept  parameters  and  different  error  variables  with 
possibly  different  distributions.  Actually,  the  possibility  of  different 
error  distributions  has  not  been  covered  in  the  work  of  this  paper,  but 
the  necessary  modifications  should  be  possible.  Suppose  the  goal  is  to 
estimate  6^  ,  . . .  ,  £tp  .  In  some  situations  it  may  not  be  appropriate 
to  compare  observations  in  different  groups.  The  groups  may  refer  to 
different  types  of  people,  locations,  times  or  some  other  blocking 
variable.  In  such  a  case,  the  between  group  comparisons  can  be  excluded 
by  using 

1/n^  if  1  ^  i  <  j  5 

l/n2  if  nj  +  1  <  i  <  J  <  nj  +  n2 
0  otherwise. 
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Then 


D 


(!/»!>  I 


|Zi-Zj|  +  (l/n2)  I  |Z±  -  Zj | 

n1<i<j 


and  the  weight  matrix  is 


(i/n  )J 
1  .n. 


\ 


B  = 
_n 


\ 


0 


-  (l/n,)J 
i  _n 


2 


/  • 


This  is  an  idempotent  matrix  and  the  covariance  formula  (6.2)  reduces  to 
(8.1)  cov(81 .  Bp)  =  (l/12(/f2)2)(X'  Bn  X)'1  . 


The  usual  approach  to  the  analysis,  when  the  error  variables  all  have 
the  same  distribution,  is  to  define  an  indicator  variable  for  groups 


Xi,p+1 


0  if  1  <  i  <  nx 
1  if  n^  <  i  +  n2 


and  add  this  term  to  the  model.  Then  use  b^  =  1  ,  the  unweighted 
dispersion  function.  From  formula  (6.3),  using  an  augmented  design 
matrix  X*  with  the  additonal  column. 


cov($1  ,  ...  , 


0  ,  Bp+1)  -  (l/12(Jf2)2)(X*'  X*)'1  . 
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When  the  covariance  matrix  of  6^^  ,  . . .  ,  8^  is  determined  in  the  usual 


way,  it  is  found  to  agree  exactly  with  (8.1).  Thus  both  methods  yield 
the  same  asymptotic  covariance  matrix  for  the  estimates.  The  omission 
of  the  between  group  comparisons  does  not  affect  the  efficiency  of  the 
estimates. 


Example  2  (One  Factor  Analysis  of  Variance)  Suppose  there  are  p  +  1 


groups  of  observations  with  sample  sizes  n^  ,  n^  ,  ...  ,  np+^  »  n“  ^ 


The  usual  model  is 


Yi  *  6o  +  h  xii  +  •"  +  6P  xiP  +  ei 


where 


1  if  i  -  nj  +  1,  ...  ,  nj  +  nj+1 


otherwise. 


for  i-1 .  n,  j  -  1  ,  . . .  ,  p  .  The  design  matrix  is 


X  - 


0 

0 


\ 


/ 


0  0 


1 
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where  0  and  1  are  column  vectors  of  appropriate  size.  This  is  the 
usual  "shift"  model  with  location  parameters  Sq  ,  ,  3q  +  Bp 

for  the  p  +  1  groups. 

Suppose  that  for  the  dispersion  function  D  we  use  weights 


bu  - 


1  if  subscripts  i  and  j 


if  subscripts  i  and  j  are  from  the  same  group, 


Then  the  weight  matrix  is 


*1 1 

-J 


-  J 
m2 1 


B 

-n 


-J 


-J 


-  J 

-  J 


•  • 


•  • 


m  -I 
P+1- 


\ 


J 


where  m. 


"i  ^k^i  ^  t*ie  num^er  observations  outside  of  the  i^ 
group  and  I  and  J  are  appropriately  sized  identity  and  unit  matrices. 
With  this  choice  of  weights,  only  the  between  group  comparisons  are  used. 


Formula  (6.2)  gives  the  covariance  matrix  of  B  using  the  above  Bn  matrix. 
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As  an  alternative,  all  comparisons  could  be  made  with  =  1  . 

Then  the  weight  matrix  is  a  multiple  of  B*  ■  I  -  (l/n)J  .  It  can 

.q  -n 

be  shown  that  the  covariance  matrix  of  B  is  the  same  using  B*  as  it 
is  using  B  .  This  shows  there  is  no  loss  of  efficiency  in  dropping 
the  within  group  comparisons.  In  fact,  the  estimates  B  are  identical 
for  these  two  cases  since  the  terms  involving  within  group  comparisons  do 
not  involve  any  parameters  and  cannot  affect  the  minimization  process. 

Another  interesting  point  will  be  illustrated  with  a  special  case. 
Suppose  p  -  3  and  there  are  equal  sample  sizes  n^  m  ,  k*  1,2,3, 4. 
Suppose  that  comparisons  are  only  made  between  adjacent  groups  in  the 
dispersion  function;  that  is  groups  1  vs  2  ,  2  vs  3  and  3  vs  4  . 

The  corresponding  weight  matrix  is 


By  direct  computation. 


(2  1  1 
12  1 
1  1  2 


If  all  comparisons  are  made,  B*  ■  I  -  (l/4m)J  and 
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(X'  B*  X)_1(X'  B*  B*  X)(X'  B*  X)_1  =  (1/m) 


Thus  the  approach  with  adjacent  group  comparisons  yields  the  same  cov¬ 
ariance  matrix  as  the  usual  approach  with  all  comparisons.  This 
suggests  that  there  is  considerable  redundancy  in  making  all  comparisons . 

The  possibility  of  restricting  the  number  of  comparisons,  without 
a  loss  of  efficiency,  may  be  especially  useful  in  more  complicated  fixed 
effect  designs.  Such  designs  can  be  viewed  in  terms  of  a  one-way  layout 
with  the  parameters  of  interest  being  contrasts  in  the  location  parameters 
of  the  groups. 
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9.  APPENDIX 


Proof  of  Theorem  4.1 


Let  -  6^  x^  +  *  *  ’  +  ®p  xip  »  i  ■  1  >  •  •  •  »  n  »  w  li  w^/n  . 

Let  0  -  (6X  ,  . . .  ,  0p) '  be  a  fixed  p  x  1  vector  and  without  loss  of 

generality  in  what  is  to  follow  take  0*  0*1  •  Consider  a  linear 

combination  U  ■  U(3)  ■  0'  T(B)  ■  ^i<j  ^ij^^i  '  *  w^ere 

h±j  -  0X  a±j  (1)  +  •  •  •  +  0p  a±i  (p)  ,  1  <  i  <  j  <  n  .  Let 

V  ■  Z°.1+1  h!3  •  h.3  -  Z£  hu  ■  V  •  0  ’  "-I  ■  0  •  '  Ji<J  h«  ' 


H  -  h  .  -  h. 
i  *i  i- 


61  +  ’**  +  ®p  ^j/P^  * 


The  theorem  will  follow  if  it  is  shown  that  0(0)  is  asymptotically 

«* 

normal  with  mean  Hi(wi  -  w)(/f2)  +  <h  Jl)  and  variance 

hJ/12  for  any  choice  of  0.  To  show  this.  Theorem  4  of  Sievers 

(1978)  can  be  directly  applied  once  the  four  assumptions  there  are 
verified. 

_  2 

The  first  assumption  requires  max  (w.  -  w)  0  .  This  follows 

l<i<n 

from  Assumption  (A^)  . 

_  2  2 

The  second  assumption  requires  *  °w  >  0  *  11118 

.  2 

follows  from  Assumption  (A^)  using  -  w)  -  0'  Xj.  *c  l  ^n* 

2  2 

The  third  assumption  requires  /  max  H±  . 
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To  examine  this,  write 

UHi  Ijt9i  +  9P  A^P)]2 

max  aJ(1)  +-•..+  aJ(P) 

■  F2  say  . 

Now 


6'  V  0 
-  ,n  . 

«<!„> 


l±  Aj(l)  +  A^(p) 

max[6.  A. (1)  +  •••  +  0  A 
,11  P  : 


> 

—  min 

0  ^  0'0  -  1 


0'  V  0 
* - 2— =-  ] 

«<!„> 


X 


In 


•  •  •  +  A 

pn 


» 


where  A,  <  A„  <  • • •  <  X  are  the  ordered  eigenvalues  of  V  .  Let 
in  —  2n  —  —  0  n 


pn 


,n 


^  «2  _i  *  £  A?  denote  the  ordered  eigenvalues  of  V  .  Then  by 


1  *2  — 

Assumption  (A^) 

A, 

In 


Y  X- 
n  In 


Xln+ 


+  X 


pn 


Y  A.  +  •  •  • 
n  In 


+  y  A 
n  pn 


X-  +  •  •  •  +  X 
1  P 


>  0 


as  n  -*■  ®  and  this  shows  F^  is  bounded  from  below  away  from  zero. 


(P)]2 


Now  with  the  Cauchy-Schwarz  inequality  used  in  the  denominator 


Then  with  Assumption  (A£>  ,  it  follows  that  -*■  0  as  n  -*•  ®  . 

Now  G2  ■  1/F^  and  was  s^own  to  be  bounded  away  from  zero. 

Thus  G2  is  bounded  and  with  the  behavior  of  G^  the  fourth  assumption 
is  met. 

Proof  of  Lemma  5.1 

It  is  sufficient  to  show  that  the  result  holds  for  each  coordinate  of 
R(A)  .  The  first  coordinate,  say  R^(A)  will  be  considered.  Let 
t^U)  *  Ai/Xji  "  +  ...  +  Ap(xjp  _  xlp)/^tt  •  Then  Rj^A)  can 

be  written 

«i«)  •  ”'3/2tIi<j  V11  “u  +  1  • 

where 

Wij  "  *(Zi  *  V  ‘  *(Yi *  Yj} 


■+1 

if 

<  Yj  -  Yj  <  0 

■  1 

-1 

if 

0  <  Yj 

.  0 

otherwise  . 

Actually  can  be  +  1/2  when  ties  occur  but  this  will  be  ignored 

since  such  events  have  zero  probability. 

In  both  cases  t^  (A)  >  0  and  t^(A)  <  0  ,  * 

G(0)  -  G(t±j  (A) )  -  -t^A)  CA) )  ,  where  |c  (A)  |  <_  1 1^  (A)  |  . 

.  2 

Then  using  g(0)  =  Jf  , 
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EO^U))  -  n'3/2  lt<i  a^U)  tlj(A)  [g(0)  -  gtt^U))]  . 

From  Assumptions  (A^)  and  (Ag) ,  it  follows  that  for  any  e  >  0 

“ax  |g(0)  -  g(5..(A))|  <  e  uniformly  in  A  c  6  for  n  sufficiently 
i<j  J  * 

large.  Further,  noting  that  ^i<j(*jk”  *ik^*jl”  Xii^2*  " 

t 

^i(xik"  V(xu‘ V/n*  write  ?i<j  (A>/(5)  -  A’  £  Xc  A/n  . 

Then,  with  use  of  the  Cauchy-Schwarz  inequality, 

lEOL^A))!  <  e  n-2 <2>  {  l±<j  Jj(l)/(%)f/2  U'  £  Xc  A/nf/2  . 

With  Assumptions  (A^)  and  (Ay)  and  the  fact  that  e  is  arbitrary, 
it  must  be  that  Edt^A))  -*-0  as  n  -*■  •,  uniformly  in  A  e  $  . 

The  variance  of  R^(A)  is 

v.rOtjW))  -  if  3  It<J  .y(l)  V,t(ul}) 

+  »"3  t  l  *lj(»  -uW  «•*««  •  • 

i<j  k<l 
(i.J)+(k,*) 

Using  Assumptions  (A^)  and  (Ay)  ,  it  can  be  shown  that  Var(R^(A))  -*■  0 
as  n  •  ,  uniformly  in  A  e  0  .  With  the  mean  and  variance  tending 


to  zero,  the  lemma  follows 


-32- 


Proof  of  Theorem  5.1 

It  is  sufficient  to  show  that  the  result  holds  for  each  coordinate  of  R(A) 
Suppose  R^A)  is  considered.  R^A)  and  (A>  are  given  in  the 
preceeding  proof. 

Let  e  >  o  ,  e'  >  0  be  given. 

From  Assumptions  (A^)  and  (A^)  there  exists  a  bound  Bq  such  that 

<1/21  (Ikj  a2tj«)/<5)l1/2  <IE.!  m.l  <*lk  -  V2/»l1/2>  i  “o 
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Onder  Assumptions  (Aj)  ,  (A^)  ,  (Ag)  ,  (Ay)  and  g  ■  0  it  can  be 

shown  that 


£«*„)  -  n'3/2  I±<j  alj(l)[G(tJj(m))  -  G(t^(m))] 

<  2g(0)Bo  6 

for  all  m  -  1 .  M  and  for  n  sufficiently  large. 

Further,  it  can  be  shown  that 

VarCQ^)  -*■  0  as  n  -*•  •  , 

for  all  m  -  1 . M  . 

Now  for  each  m  ■  1 . M  ,  choose  a  point  A  e  4  . 

«ni  in 

Then  note  that 

sup  n“3^2  T^A/v^i)  -  T^A^/Vn)  |  < 

Ae  4 

«.  HI 

for  each  m  -  1,  ...  ,  M  .  Further,  by  Lemma  5.1, 

PClVVl  1  e/3)  <  e'/2M 

for  each  m  -  1,  ...  ,  M  and  for  n  sufficiently  large. 

Putting  some  pieces  together,  for  each  m  -  1  ,  ...  ,  M  and  for 


n  sufficiently  large 
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sup 
Ae  & 


Iri(a)  -  vyl 


1  sup  n  3y,2|  T1(A/^ii)  -  T1(Am/i^)| 

5£im 


+  sup  n"3/2g(0)  l  a  (1)| t  (A)  -  t  (A  ) | 

Aei  i<j  ~  13 

„  m 


<  (^  +  g(0)BQ  6 

m  ECOj,)  +  £(0,,,)  +  g(0)BQ6 

E(Qm)  +  e/3 

Further,  for  each  m  *  1 . M  and  for  n  sufficiently  large 


P (sup  |R,  (A)  |  _>  e) 

Ae« 


<  P(sup  ^(A)  -  R_(A  )  |  +  R^ (Affl)  <  e) 

Ae  £ 

-  m 

<  P(Qm  -  E(Qm)  +  e/3  +  R1(Am)  >  e) 

<  P(Qm  -  £((^5  >  e/3)  +  P(|R1(Affl)|  >  e/3) 

<  (9/e2)  Var(Qm)  +  e’/2M 

<  tW 2M  +  e'/2M 

*  e '  /M 


Finally,  for  n  sufficently  large 
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P (sup  | R_  (A)  |  >  e)  <  ,  P(sup  | R-.(A)  |  >,  e) 

AeJO  1  ~  Ae*i 


iti  E'/M 


and  the  proof  is  completed. 

Proof  of  Lemma  6.2 

When  (l/n2)C  in  the  expression  R (A)  is  replaced  by  C  ,  the  proof 
_n  -  -  ~ 

of  this  lemma  is  routine. 


Proof  of  Lemma  6.3 


Note  that 


3Q(A) 

3A 


3D*(A) 

3A 


-  2n"3/2[T (A/^)  -  T(0) 


3/2 

n 


C  A] 


and  use  Lemma  6.1. 


Proof  of  Lemma  6.4 


With  Lemma  6.3,  the  proof  finishes  exactly  as  in  Jaeckel  (1972),  page  1454. 


Pr oo f  of  Theorem  6.1 


Again  use  exactly  the  argument  of  Jaeckel  (1972),  page  1454,  along  with 
Lemma  6.4. 
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